ESPACE: Accelerating Convolutional Neural Networks via Eliminating Spatial and Channel Redundancy

نویسندگان

  • Shaohui Lin
  • Rongrong Ji
  • Chao Chen
  • Feiyue Huang
چکیده

Recent years have witnessed an extensive popularity of convolutional neural networks (CNNs) in various computer vision and artificial intelligence applications. However, the performance gains have come at a cost of substantially intensive computation complexity, which prohibits its usage in resource-limited applications like mobile or embedded devices. While increasing attention has been paid to the acceleration of internal network structure, the redundancy of visual input is rarely considered. In this paper, we make the first attempt of reducing spatial and channel redundancy directly from the visual input for CNNs acceleration. The proposed method, termed ESPACE (Elimination of SPAtial and Channel rEdundancy), works by the following three steps: First, the 3D channel redundancy of convolutional layers is reduced by a set of low-rank approximation of convolutional filters. Second, a novel mask based selective processing scheme is proposed, which further speedups the convolution operations via skipping unsalient spatial locations of the visual input. Third, the accelerated network is fine-tuned using the training data via back-propagation. The proposed method is evaluated on ImageNet 2012 with implementations on two widelyadopted CNNs, i.e. AlexNet and GoogLeNet. In comparison to several recent methods of CNN acceleration, the proposed scheme has demonstrated new state-of-the-art acceleration performance by a factor of 5.48× and 4.12× speedup on AlexNet and GoogLeNet, respectively, with a minimal decrease in classification accuracy. Introduction In recent years, convolutional neural networks (CNNs) have demonstrated impressive performance in various computer vision and artificial intelligence applications, such as object recognition (Krizhevsky, Sutskever, and Hinton 2012)(Simonyan and Zisserman 2014)(Lecun et al. 1998)(Szegedy et al. 2015)(He et al. 2015), object detection (Girshick et al. 2014)(Girshick 2015)(Ren et al. 2015), and image retrieval (Gong et al. 2014b). The cutting-edge CNNs are computationally intensive, in which the speed limitation mainly resorts to the convolution operations in the convolutional layers1. For example, an 8-layer AlexNet (Krizhevsky, Copyright c © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. *Corresponding Author. In this paper, we focus on the acceleration of the convolutional layers, as it takes up over 80% running time in most existing CNNs, Sutskever, and Hinton 2012) with about 600,000 nodes costs 240MB storage (including 61M parameters) and requires 729M FLOP2 to classify one image with size 224 × 224. Such cost is further intensified in deeper CNNs, e.g. a 16layer-VGGNet (Simonyan and Zisserman 2014) with 1.5M nodes costs 528MB storage (including 144M parameters) and requires about 15B FLOP to classify one image. Under such circumstance, the existing CNNs cannot be directly deployed to scenarios that require fast processing and compact storage, such as streaming or real-time applications. On one hand, CNNs with million-scale parameters typically tend to be over parameterized and heavily computed (Denil et al. 2013). Therefore, not all parameters and operations (e.g. convolution or non-linear activation) are essentially necessary in producing a discriminative decision. On the other hand, it is quantitatively shown in (Ba and Caruana 2014) that, neither shallow nor simplified CNNs provide comparable performance to deep CNNs with billion-scale online operations. Therefore, to accelerate online CNNs predictions without significantly decreasing the decision accuracy, a natural thought is to discover and discard redundant parameters and operations in deep CNNs. Accelerating CNNs has attracted a few research attention very recently, most of which focus on accelerating the convolutional layer, which is the most time-consuming part of CNNs. In the literature, the related works can be further categorized into four groups, i.e. designing compact convolutional filters, parameters quantization, parameters pruning and tensor decomposition. Designing compact convolutional filters. Using a compact filter for convolution can directly reduce the computation cost. The key idea is to replace the loose and over-parametric filters with compact blocks to improve the speed, which significantly accelerate CNNs like GoogLeNet (Szegedy et al. 2015), ResNet (He et al. 2015) on several benchmarks. Decomposing 3× 3 convolution with two 1 × 1 convolutions was used in (Szegedy, Loffe, and Vanhoucke 2016), which achieved state-of-the-art acceleration performance on object recognition. SqueezeNet (Iandola, Moskewicz, and Ashraf 2016) was proposed to replace 3×3 convolution with 1 × 1 convolution, which created a comi.e. AlexNet, GoogLeNet and VGGNet. FLOP: The number of Floating-point operation to classify one image with CNNs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...

متن کامل

Estimation of Hand Skeletal Postures by Using Deep Convolutional Neural Networks

Hand posture estimation attracts researchers because of its many applications. Hand posture recognition systems simulate the hand postures by using mathematical algorithms. Convolutional neural networks have provided the best results in the hand posture recognition so far. In this paper, we propose a new method to estimate the hand skeletal posture by using deep convolutional neural networks. T...

متن کامل

Speeding up Convolutional Neural Networks with Low Rank Expansions

The focus of this paper is speeding up the application of convolutional neural networks. While delivering impressive results across a range of computer vision and machine learning tasks, these networks are computationally demanding, limiting their deployability. Convolutional layers generally consume the bulk of the processing time, and so in this work we present two simple schemes for drastica...

متن کامل

Provide a Deep Convolutional Neural Network Optimized with Morphological Filters to Map Trees in Urban Environments Using Aerial Imagery

Today, we cannot ignore the role of trees in the quality of human life, so that the earth is inconceivable for humans without the presence of trees. In addition to their natural role, urban trees are also very important in terms of visual beauty. Aerial imagery using unmanned platforms with very high spatial resolution is available today. Convolutional neural networks based deep learning method...

متن کامل

Cystoscopy Image Classication Using Deep Convolutional Neural Networks

In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017